Soil Genesis and Classification
Mastaneh Rahimi Mashkaleh; Mohammad Amir Delavar; Mohammad Jamshidi
Abstract
Introduction: Imbalanced data remains a widespread and significant challenge, particularly impacting machine learning algorithms. Therefore, addressing imbalanced data classification has emerged as a crucial research area within the field of data mining. This issue, often characterized by a limited number ...
Read More
Introduction: Imbalanced data remains a widespread and significant challenge, particularly impacting machine learning algorithms. Therefore, addressing imbalanced data classification has emerged as a crucial research area within the field of data mining. This issue, often characterized by a limited number of instances in one class and a substantial number in other classes, poses substantial hurdles for machine learning algorithms. Consequently, data mining experts and machine learning professionals are actively working on refining methods and models for classifying imbalanced data with the aim of improving the accuracy of such classifications. The principal objective of this study is to precisely detect and categorize samples from the minority class, ultimately enhancing the precision of soil class classification. This research is conducted in a specific region, encompassing the southwestern territories of Zanjan province.Materials and Methods: To achieve this objective, a total of 148 soil profiles were excavated using a regular grid pattern with an average spacing of 500 meters (and in some locations, up to 700 meters based on expert recommendations). After the samples were air-dried, they were transported to the laboratory. Physical and chemical analyses were conducted on all collected samples, including assessments of soil texture, soil pH, calcium carbonate equivalent, cation exchange capacity, electrical conductivity, organic carbon content, and gypsum content. Subsequently, the soil samples were meticulously classified and described up to the family level, following the comprehensive standards of the soil classification system. The most appropriate covariates were selected among 57 covariates including geomorphological and geological maps, digital elevation model (DEM), and data from Landsat 8 satellite images, using principal component analysis (PCA) and expert knowledge approaches for predicting soil classes selected. Saga-GIS and ENVI software were used to extract environmental covariates. Modeling of the soil-landscape relationship was performed using three algorithms, namely multinomial logistic regression (MNLR), random forest (RF), boosted regression tree (BRT) and ensemble model (after data balancing) in “R studio” software. To check the accuracy of the used model, the data was randomly divided into training and validation data. 80% of the data (118 profiles) were used for model training and 20% (30 profiles) were used as validation data for evaluation.Results and Discussion: The results of the selection of covariates showed that 10 information covariates of geomorphological maps, geological information and features extracted from the digital elevation model (DEM), including Analytical hill shading (AHS), sunrise, valley depth (VD), LS Factor, Channel network distance (CND), Topographic wetness index (TWI) and Multi-resolution ridge top flatness (MRRTF) were selected as input variables. Based on the results of profile analysis, the soils of the region at the subgroup level were categorized into five classes, with imbalanced distribution, including Typic Calcixerepts, Typic Haploxerepts, Gypsic Haploxerepts, Typic Xerorthents, and Lithic Xerorthents. The results of evaluation metrics such as overall accuracy and Kappa index were 65% and 0.32 for the RF algorithm, %60 and 0.35 for the boosted regression tree algorithm, 65% and 0.41 for the MNLR algorithm and after balancing the data with the ensemble model approach, it was 70% and 0.62 respectively. The results of two statistics of user’s accuracy and producer’s accuracy showed that among individual models, the multinomial logistic regression model has higher accuracy in predicting soil classes. Although the ensemble model has succeeded in predicting the soil minority classes well, due to the fact that the two weaker models of the RF and BRT are involved in the modeling, It showed lower values compared to the individual multinomial logistic regression model, in predicting some classes of the majority of soil, especially the two classes of Typic Haploxerepts and Typic Xerorthents.Conclusions: Conclusions: In summary, the results have demonstrated that when learning algorithms are individually applied, they do not exhibit high accuracy in spatially predicting soil classes. However, when these algorithms are amalgamated into an ensemble model, they exhibit remarkable accuracy in spatial soil class prediction, outperforming individual models in terms of performance and accuracy. Moreover, the ensemble model substantially enhances prediction accuracy and reduces the occurrence of misclassifications, especially at the subgroup level. While each specific model excels in predicting a particular soil classification, the cumulative ensemble models consistently outperform individual models in terms of overall performance and accuracy, underscoring the effectiveness of ensemble modeling in improving spatial soil classification.
Soil Genesis and Classification
mastaneh rahimi mashkale; Mohammad Amir Delavar; mohammad jamshidi; amin sharififar
Abstract
Despite the great use of digital soil maps, the problems of imbalance in classification disrupt the classification performance of many machine learning algorithms, and for this reason, it has attracted the attention of many researchers. Therefore, the aim of this research is to improve the classification ...
Read More
Despite the great use of digital soil maps, the problems of imbalance in classification disrupt the classification performance of many machine learning algorithms, and for this reason, it has attracted the attention of many researchers. Therefore, the aim of this research is to improve the classification of unbalanced soil data using resampling pretreatment technique in three forecasting models including Random forest (RF), Boosted regression trees (BRT) and Multinomial logistic regression (MNLR) in a part of the lands of Zanjan province in Iran.Sampling was done based on a regular grid pattern with 500 meters intervals, and 148 soil surfaces were randomly studied and classified. The region's soils at the subgroup level were in five classes with imbalanced distribution, including Typic Calcixerepts, Typic Haploxerepts, Gypsic Haploxerepts, Typic Xerorthents, and Lithic Xerorthents. Environmental covariates included geomorphological and geological maps, digital elevation model (DEM), and remote sensing (RS), selected by principal component analysis (PCA) and expert knowledge methods AND a number of environmental variables including geomorphological map information, Geological information and features extracted from the DEM were selected as the most effective environmental variables for predicting soil classes and as input to the model. Extraction of environmental covariates was done in ENVI and SAGA_GIS software and modeling of soil-landscape relationship was done using the aforementioned algorithms in Rstudio software. The resampling technique was applied to the minority and majority soil classes prior to modeling.The results showed that using original data that have imbalanced classes for mapping resulted in loss of the minority classes and relatively low Kappa agreement values and overall accuracy for RF (ovrall=65%, k=0.32) and BRT models (ovrall=60%, k=0.35). However, after resampling the data, two overall accuracy and Kappa coefficient statistics increased in all models. In addition, the BRT model provided an acceptable estimate by maintaining the minority classes and the Kappa coefficient of 0.64 and the overall accuracy of 75% in the spatial prediction of soil subgroups. The producer accuracy (PA) and user accuracy (UA) results showed that the two classes of Gypsic Haploxerepts and Lithic Xerorthents, which were excluded when training using imbalanced datasets in RF and BRT algorithms, showed significant improvement after balancing the data. Results show that they were well predicted in RF algorithm (UA =100%, 78%) and BRT algorithm (UA= 60% and 70%) using treated data. Also, these minority classes showed Producer accuracy in RF algorithm (PA= 75%, 88%) and BRT algorithm (PA=100%, 78%) in compared to zero accuracy when training using imbalanced data. On the other hand, the validation results of the MNLR algorithm showed that despite maintaining the minority classes after balancing the data, the minority classes were predicted with less accuracy. Results showed that modeling using imbalanced distribution of class observation caused uncertain maps with minority classes being lost and relatively poor accuracies. After data treatment, with over- and under-sampling, all models showed significant improvement in maintaining the minority classes, in evaluations. Data resampling technique can be a useful solution for dealing with imbalanced class observations to produce more certain digital soil maps.Despite the great use of digital soil maps, the problems of imbalance in classification disrupt the classification performance of many machine learning algorithms, and for this reason, it has attracted the attention of many researchers. Therefore, the aim of this research is to improve the classification of unbalanced soil data using resampling pretreatment technique in three forecasting models including Random forest (RF), Boosted regression trees (BRT) and Multinomial logistic regression (MNLR) in a part of the lands of Zanjan province in Iran.Sampling was done based on a regular grid pattern with 500 meters intervals, and 148 soil surfaces were randomly studied and classified. The region's soils at the subgroup level were in five classes with imbalanced distribution, including Typic Calcixerepts, Typic Haploxerepts, Gypsic Haploxerepts, Typic Xerorthents, and Lithic Xerorthents. Environmental covariates included geomorphological and geological maps, digital elevation model (DEM), and remote sensing (RS), selected by principal component analysis (PCA) and expert knowledge methods AND a number of environmental variables including geomorphological map information, Geological information and features extracted from the DEM were selected as the most effective environmental variables for predicting soil classes and as input to the model. Extraction of environmental covariates was done in ENVI and SAGA_GIS software and modeling of soil-landscape relationship was done using the aforementioned algorithms in Rstudio software. The resampling technique was applied to the minority and majority soil classes prior to modeling.The results showed that using original data that have imbalanced classes for mapping resulted in loss of the minority classes and relatively low Kappa agreement values and overall accuracy for RF (ovrall=65%, k=0.32) and BRT models (ovrall=60%, k=0.35). However, after resampling the data, two overall accuracy and Kappa coefficient statistics increased in all models. In addition, the BRT model provided an acceptable estimate by maintaining the minority classes and the Kappa coefficient of 0.64 and the overall accuracy of 75% in the spatial prediction of soil subgroups. The producer accuracy (PA) and user accuracy (UA) results showed that the two classes of Gypsic Haploxerepts and Lithic Xerorthents, which were excluded when training using imbalanced datasets in RF and BRT algorithms, showed significant improvement after balancing the data. Results show that they were well predicted in RF algorithm (UA =100%, 78%) and BRT algorithm (UA= 60% and 70%) using treated data. Also, these minority classes showed Producer accuracy in RF algorithm (PA= 75%, 88%) and BRT algorithm (PA=100%, 78%) in compared to zero accuracy when training using imbalanced data. On the other hand, the validation results of the MNLR algorithm showed that despite maintaining the minority classes after balancing the data, the minority classes were predicted with less accuracy. Results showed that modeling using imbalanced distribution of class observation caused uncertain maps with minority classes being lost and relatively poor accuracies. After data treatment, with over- and under-sampling, all models showed significant improvement in maintaining the minority classes, in evaluations. Data resampling technique can be a useful solution for dealing with imbalanced class observations to produce more certain digital soil maps.
M. A. Delavar; A. Naderi
Abstract
Introduction Generally, formation and development of sodic soils often appear as almost large in flat plains capable of cultivation, especially in arid and semi-arid regions. Due to their unsuitable characteristics, Slickspots leave bad effects on plants growth and finally on human health. High levels ...
Read More
Introduction Generally, formation and development of sodic soils often appear as almost large in flat plains capable of cultivation, especially in arid and semi-arid regions. Due to their unsuitable characteristics, Slickspots leave bad effects on plants growth and finally on human health. High levels of soluble and exchangeable sodium ions and colloidal material are the main marks of sodic soils. Different surface areas of Slickspot are spread over the flat and arable plains in Iran. The aim of this study was to evaluate the different properties of sodic soils and related soil formation factors in the semi-arid soils of Abyek plain. Materials and Methods The study area, with the coordinates 35° 47′ - 35° 53′ N and 50° 31′ - 50° 33′ E, was located in the southeastern of Abyek city, Ghazvin providence. Piedmont plain was the main physiography of the area and altitudes were divided in three topographic zones: 1190-1180, 1170-1160, 1150-1140 meters above sea level that the zones were classified into upper, middle and flat parts, respectively. Based on topography and site properties, 13 soil profiles were excavated in the topographic zones and all profiles were described based on USDA Standard Soil Description Manual Results and Discussion The results showed that soil acidity measured in saturated extraction ranged from 8.6 to 9.1, 9.8 to 9.7 and 9.1 to 10.1 for upper, middle and flat areas, respectively. Field observation studies of upper parts revealed that gravelly and subangular blocky soil structures were found in surface and subsurface horizons, respectively, while the subangular blocky and massive structures were found in subsurface horizons of middle parts profiles. The subangular blocky and columnar structures were demonstrated structures in profiles of the flat areas. Despite the low topography difference, 5 to 10 m in upper lands, exchangeable sodium content and electrical conductivity were low, and saline or sodic soils were not observed. These soils were classified as Xeric Haplocambids. In the middle part with 2 to 5 m difference in elevation, soils were classified as Sodic Xeric Haplocambids and Sodic Xeric Haplocalcids. The white spots observed in the sodic soils were classified as Xeric and Vertic Natrargids. Compare with the adjacent areas, the concentration of carbonate and bicarbonate anions were relatively high in soils of the flat areas that led to considerable increase in soil acidity. This can shows the accumulation of sodium carbonate salts in the soils. The presence of carbonate and bicarbonate anions in middle areas, probably was due to the development of Sodicization in the soils. The XRD diffractometers showed illite, montmorilonite, chlorite and palygorskite as the clay minerals in soil heorizons. Illite was found in all soil horizons of flat areas with deep decline. This decline was along with increasing of smectite clay minerals in Natric horizons that had poor drainage conditions. The clay coatings in the natric horizons were confirmed by micromorphology and scanning electron microscopy techniques. The cumulative clays on external surfaces of soil aggregates and wall pores, in flat areas, revealed the clay eluviation process. Because of the high soluble and exchangeable sodium cations, the conditions were favorable for transfer of clay in the soils, even in the presence of lime. Conclusion Consequently, the main soil formation factors in sodic soils can be presented as different in soil positions on piedmont physiography, the local relief, lateral and vertical movement of water and soluble salts from neighboring areas into the downstream lands and also salt and sodium containing minerals deposited by wind. The Slickspots and related soils were one of the major terrestrial phenomena in the plain Abyek. The Sodic soils in the plain were formed in the absence of high ground water table. Other environmental factors such as micro reliefs, position on the Landform, lateral movement of water and soluble salts and windborne sediments, played and essential role in the formation of sodic soils. The results of the experiments indicated that Sodicization process is developing towards the adjacent land and the absence of gypsum accelerated this development in these areas. Also, mineralogical studies indicate the presence of smectite mineral clay in Natric horizon where the drainage condition was poor and gave the possibility of neoformation of smectite, and that clay movement evidence from upper parts of profile was confirmed by micromorphological studies.